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Abstract: An upper limit is given to the amount of quantum information that can be 
transmitted reliably down a noisy, decoherent quantum channel. A class of quantum 
error- correcting codes is presented that allow the information transmitted to attain this 



r-| ' limit. The result is the quantum analog of Shannon's bound and code for the noisy classical 



channel. 



The 'quantum' in quantum mechanics means 'how much' — in quantum mechanics, 
classically continuous variables such as energy, angular momentum and charge come in 
discrete units called quanta. This discrete character of quantum-mechanical systems such 
as photons, atoms, and spins allows them to register ordinary digital information. A left- 
circularly polarized photon can encode a 0, for example, while a right-circularly polarized 
photon can encode a 1. Quantum systems can also register information in ways that classi- 
cal digital systems cannot: a transversely polarized photon is in a quantum superposition 
of left and right polarization, and in some sense encodes both and 1 at the same time. 
Even more surprising from the classical perspective are so-called entangled states, in which 
two or more quantum systems are in superpositions of correlated states, so that two pho- 
tons can encode, for example, 00 and 11 at once. Such entangled states behave in ways 
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that apparently violate classical intuitions about locality and causality (without, of course, 
actually violating physical laws). 

Information stored on quantum systems that can exist in superpositions and entangled 
states is called quantum information. The unit of quantum information is the quantum bit, 
or qubit (pronounced 'Q-bit'),^ the amount of quantum information that can be registered 
on a single two-state variable such as a photon's polarization or a neutron's spin. This paper 
puts fundamental limits on the amount of quantum information that can be transmitted 
reliably along a noisy communication channel such as an optical fiber. Theorems are 
presented that limit the rate at which arbitrary superpositions of qubits can be sent down 
a channel with given noise characteristics, and encoding schemes are presented that attain 
that limit. 

It is important to compare the results presented here — the use of a quantum channel 
to transmit quantum information — with schemes that use quantum channels to transmit 
classical information, as in Caves and Drummond's comprehensive review of quantum 
limits on bosonic communication rates. ^ The limit to the rate at which arbitrary sequences 
of ordinary classical bits, suitably encoded as quantum states, can be transmitted down 
a quantum channel such as an optical fiber is given by Holevo's theorem. In contrast, 
the results presented here limit the rate at which arbitrary superpositions of sequences of 
quantum bits can be sent reliably down a noisy, decoherent quantum channel. As such, 
the theorems presented in this paper are complementary to the results of Schumacher^'^ 
and Josza^ on the noiseless quantum channel. Any channel that can transmit quantum 
information can be used to transmit classical information as well. It is possible, however, 
for a channel to be able to transmit classical information without being able to transmit 
quantum information: examples of such completely decoherent channels will be discussed 
below. 

The difference between quantum and classical information does not arise from a fun- 
damental physical distinction between the systems that register, process, and transmit 
that information. As just noted, quantum channels can be used to transmit classical 
information. And after all, 'classical' information-registering systems such as capacitors 
and neurons are at bottom quantum-mechanical. The difference arises from the condi- 
tions under which such systems operate. When properly isolated from their environment, 
photons and atoms can exist in superpositions and entangled states for long periods of 
time, with experimentally measurable results. Capacitors and neurons, in contrast, inter- 
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act strongly with a thermal environment, which prevents them from exhibiting coherent 
quantum effects. As a result, quantum information can be used to perform tasks that 
classical information cannot. 

A full theory of quantum information and its properties does not yet exist. However, 
the ability to transmit and process quantum information reliably provides the solution to 
problems to which no classical solution is known: if entangled quantum bits can be trans- 
mitted and received, quantum cryptographic techniques can be used to create provably 
secure shared keywords for unbreakable codes;"^ while the ability to process quantum in- 
formation allows quantum computers efficiently to factorize large numbers and to simulate 
local quantum systems.^ 

For quantum information to prove useful, it must be transmitted and processed reli- 
ably. Quantum superpositions and entangled states tend to be easily disrupted by noise 
and by interactions with their environment, a process called decoherence.^"^ Until recently, 
decoherence and noise seemed insurmountable obstacles to reliable quantum information 
transmission and processing. However, in 1995, Shor exhibited the first quantum error- 
correcting routine.® since then, several such routines have been proposed^" All of these 
routines have the feature, common to many classical error-correcting codes as well, that 
the rate of transmission of quantum information goes to zero as the reliability of trans- 
mission goes to one. This paper shows that arbitrarily complicated quantum states can 
in principle be encoded, subjected to high levels of noise and decoherence, then decoded 
to give a state arbitrarily close to the original state, all with a finite rate of transmission 
of quantum information. The paper states and outlines the proof of theorems that put on 
upper bound to the capacity of noisy, decoherent quantum channels to transmit quantum 
information reliably, and exhibits a class of quantum codes that attain that bound. 

1. Quantum Sources 

A quantum channel has a source that emits systems in quantum states, (the signal) 
to the channel, and a receiver that receives the noisy, decohered signal emitted by the 
channel. For example, the source could be a highly attenuated laser that emits individual 
monochromatic photons, the channel could be an optical fiber, and the receiver could be 
a photocell. Or the source could be a set of ions in an ion-trap quantum computer^^ 
that have been prepared by a sequence of laser pulses in an entangled state, the channel 
could be the ion trap in which the ions evolve over time, and the receiver could be a 
microscope to read out the states of the ions via laser-induced fluorescence. This second 
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example indicates that a quantum channel can transmit quantum information from one 
time to another as well as from one place to another. As Shannon emphasized, a computer 
memory is a communications channel. 

A more complete picture of a quantum channel is as follows (Figure 1): the input 
signal is some unknown quantum state; the input is fed into an encoder that transforms it 
into a redundant form; the encoded signal is sent down the channel, subjected to noise and 
decoherence; the noisy, decohered signal is then fed into a decoder that attempts to restore 
the original signal. Quantum encoding and decoding requires the ability to manipulate 
quantum states in a systematic fashion, for example, by using Kimble's^^ photonic quantum 
logic gates or Wineland's realization^^ of the ion-trap quantum computer proposed by 
Cirac and ZoUer.^^ From a practical point of view, such decoding and encoding may prove 
the most difficult part of reliable quantum information transmission and processing. This 
paper will simply exhibit coding and decoding schemes that attain the channel capacity: 
it will not address how such schemes can be carried out in practice. 

In order to demonstrate the quantum analog of Shannon's noisy coding theorem, 
it's helpful to set up a quantum formalism that corresponds closely to the classical picture 
of a noisy channel. Quantum systems and quantum signals are described by states IV') 
in a Hilbert space 7^, or more generally, by density matrices p E H* <Si H. A quantum 
ensemble £ = {{\ipi) , p^)} is a set of quantum states iV'i) belonging to the same Hilbert space 
H, together with their probabilities pi. The expectation value of a measurement on the 
ensemble corresponding to a Hermitian operator M is {M)£ — J^iPiii^il^l^i) = trMpg , 
where ps = ^iPi\'^i){'^i\ is the density matrix corresponding to the ensemble. The states 
I'i/'i) need be neither ortho normal nor normalized, as long as Xli ^'^(V'ilV'i) = trpg = 1. That 
is, a quantum ensemble is just the quantum analog of a classical ensemble, where care has 
been taken to take into account the inherently statistical nature of quantum mechanics. 

Two ensembles that have the same density matrix are statistically indistinguishable: 
no set of measurements can distinguish whether a sequence of states is drawn from one 
ensemble rather than the other. An example of statistically indistinguishable ensembles is 
S, = {{\ t),l/2), (I i), 1/2)}, and 

S2 = {(I T), 1/3) , (1/2| T) + V3/2\ i), 1/3) , (1/2| t) - V3/2\ j), 1/3)} , 

both with density matrices p = 1/2| | + 1/2| J,) (J, |. Note that an ensemble over 

a finite dimensional Hilbert space can contain an infinite number of states, e.g., £ = 
{(e*'^| t))P(0) = l/27r)}, in which case each state is paired with a continuous probability 
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density, p{(f)), and p = J^^ {l/27r)e'^'^\ t)(T |e~*'^cZ0 = | t)(T I- Because of the inherently 
statistical nature of quantum mechanics, different quantum ensembles can be statistically 
indistinguishable, while two classical ensembles are statistically indistinguishable if and 
only if they are identical. 

A particularly interesting type of continuous quantum ensemble is the uniform en- 
semble over a Hilbert space 7i, S-^ = {{\(/}) G H^p^ = l/vol7i)}, where yoYH is the 
volume of the unit sphere in Ti.. This ensemble contains every possible state and super- 
position of states in 7i, all with equal probabilities. The corresponding density matrix is 
Pt-c = (l/d) Yli=i where d is the dimension of Ti. and {\ (pi)} is an orthonormal basis 

for 7i. If we wish to transmit arbitrary superpositions of states down quantum channels, 
the sources of interest are of the form S-h for some H. 

Like Shannon, we will restrict our attention to stationary, ergodic sources. A station- 
ary source is one for which the probabilities for emitting states doesn't change over time; 
an ergodic source is one in which each sub-sequence of states appears in longer sequences 
with a frequency equal to its probability. (These assumptions are made for convenience 
of analysis only: in fact, the inherently statistical nature of quantum mechanics makes 
them less necessary in the quantum than in the classical case, and the results derived can 
be generalized to non-stationary, non-ergodic sources.) We define a stationary, ergodic 
ensemble over time steps as one whose density matrix is the tensor product of N times 
its density matrix over a single time step: p^ = p ® p ® . . . ® p. 

There are many different quantum ensembles with density matrix p® . . .® p. But as 
noted by Schumacher^ and Josza ^, there is one ensemble in particular that effectively 
contains all such ensembles. Let p = where the 0j are orthonormal. Consider 

the subspace spanned by the 'high- probability' product states j^i^) . . . where 
each occurs in the product ^ PiN times. These states are the analog of high- probability 
sequences of symbols for a classical source. The following theorem then follows as an 
immediate corollary to the noiseless quantum channel source theorem of Schumacher-^'^ 
and Josza^: 

Theorem 1. (Quantum source theorem.) Let \iJj) he selected from any ensemble with 
density matrix p ® . . . ® p. Then as A — > oo, is to be found in the high-probability 
subspace TY^r with probability 1. TYat is a minimal subspace with this property, in the 
sense that any other such subspace contains TYjv. 
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That is, as N ^ oo, the ensemble contains with probabihty one the members 
of any ensemble with density matrix p ® . . . ® p. A more precise statement of theorem 1 
is that as AT — > oo, '^\^^P\^){i^\Pfij^\'4^) — 1, where P^n is the projection operator onto 
ii.^ . By Shannon's source theorem, the dimension of £^^jv is approximately e^*^ where 
S = — trplnp. As with Shannon's theorems for classical sources, which simplify the analysis 
of the classical noisy channel by focusing on high-probability inputs, and as with the use of 
high-probability subspaces in the noiseless quantum channel theorem in references (1) and 
(3), the quantum source theorem simplifies the analysis of the noisy quantum channel by 
focusing on a particular subspace of inputs. A coding scheme that works for any ensemble 
with density matrix p ® . . . ® p works for the states in the high probability subspace. 
Conversely, a coding scheme that works for the high-probability subspace works for any of 
the ensembles that it contains. Accordingly, from this point on, quantum sources will be 
taken to be ensembles over high-probability subspaces unless otherwise stated. 

2. The Quantum Channel 

A quantum communications channel takes quantum information as input and produces 
quantum information as output. An optical fiber is an example of a quantum channel: a 
photon in some quantum state goes in, suffers noise and distortion in passing through 
the fiber, and if it is not absorbed and does not tunnel out, emerges in a transformed 
quantum state. In the normal formulation of quantum mechanics, the ingoing system 
that carries quantum information is described by a density matrix pin, and the outgoing 
system is described by a density matrix pout = «5(pin), where <S is a trace- preserving linear 
operator called a super-scattering operator. For simplicity, the channel will be assumed 
to be time-independent and memoryless, so that it has the same effect on each quantum 
bit that goes through. (The generalization to time-dependent channels with memory is 
straightforward. ) 

An equivalent method of formulating the channel's dynamics specify its effect on 
each of an ortho normal basis {|</>i)} of input states: the output of the channel for in- 
put \4>i) is then given by the ensemble £|<^.) = {i\'^j{i))TPj{i))} of output states into 
which can evolve, together with the probabilities that evolves into the 

state I'i/'jXi))- The density matrix and ensemble pictures of the effect of the channel are 
related as follows: S{\4>i){4>i>\) = y/Pj{i)Pjii') I ' which for i = i' gives 

s{\(i>i) (0i I) Ej(i) I V'j(i)) (V'j(i) I • 

For example, if the channel is noiseless and distortion-free, then S is the identity opera- 
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tor, and = 1)}. This channel transmits both classical and quantum information 

perfectly. Another example is the completely decohering channel, which can be thought of 
as the channel that destroys off-diagonal terms in the density matrix: S{Y^-j cty = 
J2i ctiil'Pi) {4'i\i or equivalently and perhaps more intuitively, as the channel that random- 
izes the phases of input states: — > 8\^.^ = {(e*'^|(/)i),p(A) = l/27r)}. The completely 
decohering channel highlights the difference between the use of quantum channels to carry 
classical information and their use in carrying quantum information: it transmits classical 
information perfectly, but transmits no quantum information at all — no superpositions 
or entanglements survive transmission. 

Most quantum channels are neither noiseless nor completely decohering. The next 
theorem quantifies just how much quantum information can be sent down a noisy, deco- 
hering channel. As above, we restrict our attention to stationary ergodic sources with 
density matrix = YliiPi\4'i){4'i\- The inputs to the channel are then described by a 
density matrix = pin (8) ... (8) Pin, and the output is described by a density matrix 

Pont = Pout <8) . . . ® Pout, where Pout = «5(pin) = J2i,j{i)PiPj{i)\'^j{i)){'^Pj{i)\- 

As A?" — > oo, input states come from the subspace 'H(^ with probability 1, and out- 
put states lie in the subspace Ti^t spanned by high- probability sequences of outputs, 
■ ■ ■ \'^jN{iN))j where each appears in the sequence ~ PiPj(i)N times. The 

dimension of T^^t ~ 2~^*^^°"*^°S2Pout Xo gauge the quantity of quantum informa- 
tion sent down the channel, look at the effect of the channel on a typical input state 
l«Ar) = (Xi^...iN\<Pii) • • • l^iiv) ^ '^h[, where the sum is over high-probability input 

sequences in which appears ~ PiN times. We have. 

Theorem 2: (Quantum channel theorem.) As oo, when la^) is input to the 

channel, the output lies with probability 1 in a minimal subspace whose average 
dimension over a at is the minimum of e-^*^""* , e^*^" , where Sa = — trp^lnpa and p^ = 

The proof of theorem 2 is somewhat involved, but the form of p^ can be understood 
simply. One of the primary uses of a quantum channel is the distribution of entangled 
quantum states for the purpose of quantum cryptography or teleportation. Take a two- 
variable entangled state of the form Yl- 'sjpi\(j)i)\(j)i) ■ Like the state (l/-\/2)(|0)|0) -|- |1)|1)) 
described in the introduction, this state is a maximally entangled state that registers all 
the states at once; the factors of ■s/pl insure that each of the two quantum variables 
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taken on its own is described by a density matrix pin- Now send the first variable down 
the channel. The result is a partially entangled state for the two variables described by 
density matrix p^. That is, Sa is the entropy increase when one of two fully entangled 
variables is sent down the channel. A thorough treatment of the effect of noisy channels on 
entangled states can be found in reference (18). The effect of the channel on an A/"- variable 
state \aN) can be understood as follows: almost all input states \aN) are fully entangled, 
with density matrix describing each variable on its own.^^ Sending n of the variables 
through the channel then increases the entropy by nS^, which is in turn the logarithm of 
the dimension of the minimal subspace that can encompass the channel's possible outputs. 
If Sa > Sout, then sending all the variables through completely randomizes the output 
as N ^ oo, and no coherent quantum information survives the transmission through the 
channel. 

Theorem 2 suggests that the amount of quantum information transmitted down the 
channel from a stationary, ergodic source with density matrix be defined as Iq{p\-b) = 
-trpoutlog2Pout + trpalogaPa = 'S'out - Sa if 'S'out > Sa, = Otherwise. This definition of 
quantum information transmitted is the quantum analog of mutual information between 
channel inputs and outputs: when pure states are sent down the channel, Iq tells how 
much information one gets about which pure state e went in by looking at the noisy 
mixed state e Hout ^^at comes out.^° 

The full justification of Iq as the quantum information transmitted down a quantum 
channel will be presented in the next section, in which quantum coding schemes will be 
presented that allow the reliable transmission of quantum information at a rate Iq, and 
in which it will be noted that no coding schemes exist for stationary, ergodic sources that 
can surpass this rate. For the moment, consider three examples of quantum channels, each 
with source described by pin = (1/2)(|0)(0| + |1)(1|). (i) In the noiseless quantum channel, 
— trpoutlog2Pout — 1: ~trpcilog2Pa = 0, and Iq — 1 qubit, reflecting the fact that each qubit 
is received as sent, (ii) In the completely decohering/ dephasing channel, — trpoutlog2Pout = 
1, Pa = (1/2)(|0)(0| ® |0)(0| + ® -trp^logsPa = 1, and Iq = qubits, so 

that no quantum information is sent, (iii) Consider a partly dephasing channel in which 
|0)(0| ^ |0)(0|, ^ and |0)(1| ^ (1 - 6)|0)(1|, |1)(0| ^ (1 - 6)|1)(0|. Here, 

Pa = (1/2)(|0)(0| ® |0)(0| + |1)(1| ® |1)(1|) + (1 - e)(|l)(0| ® |1)(0| + |0)(1| ® |0)(1|) 

and -trpalog2Pa = -(1 - e/2)log(l - e/2) - (e/2)log2(e/2), giving an Iq that ranges 
continuously from 1 for e = (no decoherence) to for e = 1 (complete decoherence). 
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3. Optimal codes for the noisy quantum channel 

Define the capacity of a quantum channel to carry quantum information to be Cq = 
maXp.^/Q(pin). Cq is the maximum over all sources pin of the quantum information Iq 
transmitted down the channel. We then have the following 

Theorem 3. (Noisy quantum channel coding theorem.) Consider a quantum channel 
with capacity Cq. The output of a stationary, ergodic source with density matrix p can 
be encoded, sent down the channel, and decoded with reliability — > 1 as — > cxo if and 
only if -trplogaP < Cq. 

Like Shannon's noisy coding theorem, theorem 3 comes with the caveat that it applies 
to high- probability sources. The proof to theorem 3 will be given elsewhere: but the idea 
behind the proof, as well as the theorem's meaning and implications can be understood 
as follows. The noisy, decohering quantum channel has two effects on the quantum in- 
formation that it transmits. First, like the classical channel, it adds noise to the signal, 
flipping qubits and adding random information. Second, it decoheres the signal by ran- 
domizing phases and acquiring information about the quantum information transmitted. 
Decoherence is an effect with no classical analog: classical signals do not have phases, and 
acquiring information about a classical signal is harmless as long as the signal is not altered 
in the process. In quantum mechanics, however, acquiring information about the signal 
means effectively making a measurement on it, and quantum measurement unavoidably 
alters most quantum systems. 

The problem of decoherence implies that signal must be encoded in such a way that 
any information the channel gets about the encoded state reveals nothing about which 
state of the source was sent. Otherwise, the channel can effectively 'measure' the output 
of the source, irretrievably disturbing it in the process. As noted by Shor^, this may be 
accomplished by encoding the signal as an entangled state. In fact, each encoded signal 
must have the same density matrix as each other encoded signal for each qubit sent 
down the channel: otherwise the channel can distinguish between different signals and 
decohere them. If the signals are encoded as entangled states in this fashion, the channel 
can decohere the codeword, but it cannot decohere the original signal. 

Suppose someone hands you a quantum system in some unknown state selected from 
an ensemble with density matrix p (8) . . . (8) p, and asks you to transmit it reliably down a 
noisy, decoherent quantum channel. What do you do? (If someone hands you a system 
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in a known quantum state, no quantum channel is necessary: you can just use a classical 
channel to transmit instructions for recreating the state using a quantum computer.) The 
following encoding attains the channel capacity: First, identify a source for the channel 
that attains the channel capacity, so that Iqipin) — Cq. Next, encode the state to be 
transmitted by applying a transformation that maps an orthonormal basis for the input 
high-probability subspace to a randomly chosen set of orthogonal states taken from the 
high-probability subspace of the source that attains the channel capacity. Such random 
states have the desired property that they are fully entangled, and each qubit in the 
encoded signal has density matrix pin-^^ Now send the encoded signal down the channel. 
Because the states are fully entangled, the channel cannot get any information about the 
original pre-encoded state: all the channel can do to disrupt the encoded state is add 
entropy S'out — Cq per symbol transmitted. That is, the encoding protects the original 
state from decoherence; and as long as — trplog2P < Cq there is enough redundancy in 
the encoded state to recreate the original state, just as in the classical case. This method 
works equally well if the initial state is pure, mixed, or entangled with some other system. 

Examples: In the three cases discussed in the previous section, the channel capacity is just 
Iq, as calculated. The important fact to note is that even very high levels of decoherence 
(e — > 1) can be tolerated in principle. A case of considerable interest is that in which each 
qubit system sent down the channel has a probability 77 of being decohered and randomized. 
In this case, 

E {{l-v)/2\i){i'\®\i){i'\ + {v/m{i\^\i'){i'\) ■ 

ii'=0,l 

Sq, can be calculated for this case and is equal to — (3?7/4)log2(r7/4) — (1 — 3?7/4)log2(l — 
3?7/4), which is equal to 1 for 77 f=i .252. The highest rate of errors that can be corrected 
by an optimal coding procedure is just above 1/4 (see also reference (12)). This example 
contrasts with the classical channel, in which arbitrarily high levels of noise can be tolerated 
in principle: quantum coding can correct for arbitrarily high levels either of noise, or of 
decoherence, but not of both together. 

Discussion 

In practice, even if the channel capacity is not exceeded, the amount of noise and 
decoherence that can be tolerated is limited by the ability to encode and decode: as 
A?" — > 00, the error in the transmitted state goes to zero, but the amount of quantum 
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information processing that must be done to encode and decode becomes large. The 
encoding and decoding itself must be performed reliably. 

The usefulness of the classical noisy coding theorem is also limited by coding difficul- 
ties: in particular, random codes are hard to encode and decode. In this respect, however, 
the quantum theorem has a considerable advantage. As Shannon noted, random codes are 
effective because the bits that make up the signal have no apparent order. In the classi- 
cal case, this implies that sequences of bits must appear random. In the quantum case, 
however, as long as the encoded signal is fully entangled, each qubit in the signal taken 
on its own appears to be completely random. As a result, the code words themselves may 
be highly regular: a simple example of a set of codewords that are easy to encode and 
decode, but are sufficiently random to attain the channel capacity are N qubit analogs of 
the familiar two-qubit entangled states 

(l/x/2)(|01) - |10)), (l/x/2)(|01) + |10)), (l/x/2)(|00) - |11)), (l/x/2)(|00) + |11)). 

In the classical case, random codes are hard to construct. In the quantum case, codes 
that are sufficiently random to attain the channel capacity can be constructed by a brief 
quantum computation. 

In conclusion, this paper has derived fundamental limits on the amount of quantum 
information that can be sent reliably down a quantum channel, and has exhibited codes 
that attain those limits. In fact, almost all codes attain those limits. As with Shannon's 
classical noisy coding theorem, the rate of transmission of quantum information remains 
finite as the probability of error goes to zero. 
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Appendix 



Properties of ensembles of states. The idea behind the the ensemble picture of quantum 
mechanics is to deal with mixtures and superpositions in the same formalism. Accordingly, 
a primary purpose of the ensemble picture is to make an explicit distinction between 
quantum states that can interfere with eachother, and quantum states that can't. The 
ensemble picture is constructed so that different members of an ensemble cannot interfere 
with eachother, while corresponding members of different ensembles can interfere. The 
second purpose of the ensemble picture is to keep track explicitly of the normalization of 
states, so that high-probability sets of states can be identified correctly. 

As noted on page 4, a quantum ensemble £^ = {i\i(^j),Pj)} is a set of quantum states 
together with their probabilities. Ensembles are collections of vectors, and share many 
properties of vectors. For example, if £^ and £^<^ = we can define a scalar 

product £^ • = ^/PjQj{'4'j\(t'j)- If ^ is normalized, then E • £ = tips = 1. (Note 
that the rule for obtaining the proper statistics is to associate a factor of .^p] with each 
occurrence of This vector-like character of ensembles allows the straightforward 

characterization of properties of quantum operators. For example, the trace-preserving 
character of the super-scattering operator (page 7) can be summarized by the requirement 
that Si^.ySi^',) ^ 6jj>. 

A type of ensemble that will prove useful below is one that is obtained by super- 
posing corresponding states from two ensembles. If corresponding states have the same 
probability, for example if pj — qj for the ensembles S^, £^ above, then the ensemble of 
superpositions of a times the states of plus (3 times the corresponding states of £^ is 
just {{a\(f)j) + ,Pj)}., with density matrix p as above. In fact, because we will work 
with ensembles of high-probability states, which have equal probabilities, this is the type 
of ensemble that we will have occasion to use below. If the corresponding states from the 
different ensembles do not have the same probabilities, then we write the ensemble of su- 
perposed states as Eonjj+p^ = {(a|(/>i) -I- to indicate the ensemble obtained by 
superposing a times the states of £^<^ plus (3 times the corresponding states of together 
with a list pjQj of the probabilities of the individual states in the superposition. We specify 
superposition ensembles in this fashion to keep track explicitly of the normalization of the 
individual states in the superposition. The proper overall normalization of such ensembles 
is obtained as above by associating a factor of .yjp] with each and a factor of ^/gj with 
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each \4>j)-i so that 

Pfc.*+^^ = XI (^^(iMi) {4>j I + OiP^/qJpJ\(f>j) {ipj I + pay/pjqjlijj) {(f)j \ + PPPjlipj) {ipj \ • 
j 

Note that the superposition ensemble has the same density matrix as the ensemble of unnor- 
malized states, {(a^/gjl^j) !)}• If we wish to superpose many ensembles, Ei = 

{{\'4^j{i)),Pj{i))}, we will use i to index the ensembles, and j to index the different members 
of each ensemble: e.g., Ep = i Pj{i))} is the ensemble got by superposing the 

j'th members of each of the ensembles with probability associated with the j'th mem- 
ber of the z'th ensemble. £p has density matrix pp = ^yPj (i)Pj{i')\^j{i)){^j(i')\- 
In this notation, states with different j cannot interfere, but states with the same j but 
different i can interfere. 

This definition of superpositions of ensembles allows us to complete the identification 
of ensembles with vectors by defining aS(j) + PS^ = Sa(j)+ptp- In addition, this definition of 
superposition makes a self-consistent connection between the ensemble and superscattering 
pictures of time evolution, a fact that will prove useful below. The ensemble picture is 
related to the operator sum representation of superscattering operators described, e.g., in 
reference (20). 

Appendix 1 

Outline of the proof of theorem 1. Theorem 1 follows from directly from the results of 
references (1) and (3), where a detailed treatment of high-probability subspaces may be 
found. The proof goes as follows. If \ip) is selected, with probability then 

is just the classical probability of the set of high- probability sequences, and ^ 1 as A?" — > oo. 
As a result, for any e > 0, N can be picked sufficiently large so that a state picked from 
any stationary, ergodic ensemble with density matrix p has overlap > 1 — e with some 
state in , with probability > 1 — e. Minimality follows since is itself an ensemble 
with density matrix p <S) ■ ■ ■ <S> p ais N ^ oo. Minimality is a relatively weak property: 
need not be the only minimal subspace; but all other such minimal subspaces have 
approximately the same dimension as A?" — > oo. 

Appendix 2 
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Outline of the proof of theorem 2. There are several ways to prove the noisy channel 
theorem. One way is to follow along the lines suggested in the text and analyze the 
channel's effect on entangled states. The following method of proof is closer in spirit to 
the classical derivation of channel capacity. 

In the density matrix picture of the channel, the channel has the effect, 

\a){a\ >Pa= XI «ii-iiv«i'i...i^'5(|0j,)(0i'J)(g)...(8)5(|0i^)(0j^|) , (2.1) 

where the sum is taken over high-probability sequences in which i appears ~ piN times. 
Equivalently, in the ensemble picture, 

|«) ^ = {( XI l^il(il)) • ■ • \'PjN{iN))^Ph{il) ■ ■ -PiNiiN))} (2-2) 

i\...iN 

= {(E«i|^j(i))'^'j(i))> ' (2.3) 
i 

where the superposition ensemble is defined as in appendix and has density matrix p^. 
Theorem 1 implies that as A?" — > oo, then with probability 1, the states of Eq. are to be 
found in the Hilbert space spanned by high-probability states of the form 

X «ii-»ivlV'ii(ii)) • • • IV'jArCiiv)) ' 
ii ...ijv 

where in each term of the superposition, appears ^ PiPj{i)N times. The minimality 

of Tia follows as in theorem 1. This proves the first part of theorem 2. 

The dimension of the output Hilbert space is equal to one over the average overlap 
of two members of that space: dimTY^ — (trhpP^)"^, where the trace trhp is taken over 
high-probability sequences only. We wish to calculate the average dimension of the output 
Hilbert space over a. Using the fact that < Q!ii...i^ >«= Pn • • -PiN^hi^ • • -^iNi'^i 
after some algebra, we obtain 

< trhpp^ tii^Mutf + trhp(pl)'^ - trhp(pf/J^ , (2.4) 

where pout and pa are defined as above, Pi/o = Y.iPi^i\(t>i){(t>i\) ® and (p^)^ = 

p^ ® . . .® p^. We can now use the fact that trhp(p^)-^ = 2^*^^^°S2P^ which can be simply 
verified in a basis in which p is diagonal. We then have 

trhp(Pout)^ = 2^*'"^-*'°S2Pout = 2-^^°"*, (2.5) 
trhp(p|)^ = 2^*'^''-'°g2P^ = 2-^^-, (2.6) 
trhp(pf/o)^ = 2^*'^'/°'°S2pi/o = 2~^Si Sout(i)+Sin)_ ^2.7) 
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As AT — > oo, < (dimTY^)"^ > goes to the largest of these three terms of which the third 
is less than or equal to either of the first two. We have actually calculated the average 
of the inverse of the dimension of the output subspace: however, the standard deviation 
V< (trhpPa)^ >a - < trhpp2 >2 is proportioual to (< trhpPout >«< trhpp| >a)^/^ and 
so goes to zero exponentially faster in N than < trhpP^ >a except when — Sout, in which 
case Cq = 0. As a result, the average of the inverse is the inverse of the average, and the 
average dimension of dimTY^ is the smaller of 2~^*^^«'°^2P« and 2~^*^^°"*^°S2pout^ proving 
the second half of theorem 2. Note also that the standard deviation of the dimension of 
as a fraction of the average dimension also goes to zero as — > oo, showing that 
almost all a correspond to an output space of the same dimension. 

Appendix 3. 

Outline of the proof of theorem 3. The high probability subspace for this source has 
dimension 2~^^^p^'^^'2P . Encode the basis states \xf) for the source as randomly chosen 
orthogonal states in the high-probability subspace of a source that attains the channel 
capacity. The channel takes each \a^) to some state in the ensemble Sc^ with minimal 
subspace 'H^^. The average over o;^ of the overlap K'^aJ'^aj)! of states I^AaJ ^ '^ai^ 
li^aj) G '^ajj for ^ 7^ i can be calculated as in appendix 2, and is equal to l/dimTiI^^ = 
2Wtrpoutiog2Pout If pN ^YiQ projection operator onto 7Y^., we have 

trP^ = 2~-^'-~*'''°°"*^°^2Pout+trpalog2Pa) _ (3 1) 

CXi ex. j \ / 

That is, as A?" ^ oo, the overlap between any two individual output subspaces ^ as 
long as the quantum channel capacity is not zero. The dimension of the direct sum of 
the output subspaces remains less than or equal to the dimension of TY^t if and only if 
-trplogaP < Cq: 

dim ^ 2~-'^(^'^''^°g2P+Palog2Pa) 

i 

_ 2-'V(-trpoutlog2Pout-C) ^3 2) 

where C, = Cq — (— trplog2p). So if C > 0, the source entropy does not exceed the channel 
capacity, and the output states corresponding to different input basis states all fall in 
distinct subspaces. The overlap of any one output subspace with the direct sum of all the 
remaining subspaces goes as 2"^''. If C < 0, the output subspaces overlap and no unique 
decoding is possible. This proves that Cq is an upper limit on the channel capacity for 
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'typical' codewords belonging to the high-probability subspace (i.e., for a set of measure 
1 as A?" — > oo), but it does not rule out the possibility of the use of a set of codewords of 
measure 0. 

In the case C ^ 0, a unitary decoding transformation can now be applied to the output 
states to put each vector G H^. into the form \Xi') ® IV'^)? i^i which vectors in differ- 
ent output subspaces but with the same in (2.3) give the same Because of the 
asymptotic orthogonality of the output spaces, this decoding recreates \xf) with fidelity 
arbitrarily close to 1 as ^ oo. The crucial point is that this decoding also recreates su- 
perpositions of input states with fidelity ^ 1 as — oo: by going to the ensemble picture, 
it can be verified that Ylik^k\Xk) mapped to an ensemble {i^j^lkWk) ® \'^^) ^P\ip))}- 
The steps are as follows. First, encoding: 

Y.^k\x^) ^Y.^^ E <...ijK)--Mi.) (3.3a) 

k k ii...iN 

Next, the effect of the channel: 

k ii-.-iN 

Finally, decoding: 

C^lklXk)® E (^il-iNl'^Mh)) ■■■I'^jNiiN)) : PMil)---PjN{iN)) } 

k ii...ijv 

= {{J2^k\Xk)®\i^'') , Pi.)} • (3.3d) 

k 

The fact that the decoding process faithfully recreates superpositions can also be verified in 
the density matrix picture by using the correspondence in appendix 2. Since the encoding 
and decoding preserves pure states with their phases, it also preserves mixed states and 
any entanglement between the input state and another quantum system. 

This proves the if part of the theorem. The only if part for codewords from the 
high-probability subspace was proved above. This proves the theorem as stated. 

The limits set by theorems 2 and 3 hold only for codewords from the high-probability 
set: by using codewords taken from the set of measure zero, it may be possible to improve 
on these limits. A simple example of how this may be done is given by the method of 
theorem 3 itself: block together the quantum symbols (e.g., qubits) in groups of £, and 
regard each group of £ as a new, composite symbol. The minimization procedure used 
for finding the quantum channel capacity in general yields a different, potentially higher 
channel capacity for codes composed of the composite symbols. 
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Figure 1 



Noise Decoherence 

i T 

Encoder — > C(\ip)) — > Channel ^ N{C{\ij))) Decoder ^ \ip)+ Noise 



Figure 1: Diagram of the noisy, decoherent quantum channel. To send an arbitrary quan- 
tum state lip) down the channel, first encode it in a redundant form C(IV'))- The encoded 
state is sent down the channel, where it is subjected to noise and decoherence. The arrows 
indicate that noise is added to the signal, while decoherence arises from the environment 
getting information about the signal. The noisy, decoherent signal N(^C{\'ip))) is then fed 
through a decoder that recreates the original state together with extra random information 
that depends on what errors occurred. 
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